Auto-detection of favorable and unfavorable outliers using unsupervised clustering

ABSTRACT

Methods, systems, and articles of manufacture, including computer program products, are provided for auto-detection of favorable outliers and unfavorable outliers using unsupervised clustering.

FIELD

The present disclosure generally relates to machine learning.

BACKGROUND

Many organizations may rely on enterprise software applicationsincluding, for example, enterprise resource planning (ERP) software,customer relationship management (CRM) software, and/or the like. Theseenterprise software applications may provide a variety offunctionalities including, for example, invoicing, procurement, payroll,time and attendance management, recruiting and onboarding, learning anddevelopment, performance and compensation, workforce planning, and/orthe like. Some enterprise software applications may be hosted by acloud-computing platform such that the functionalities provided by theenterprise software applications may be accessed remotely by multipleend users. For example, an enterprise software application may beavailable as a cloud-based service including, for example, a software asa service (SaaS) and/or the like.

SUMMARY

Methods, systems, and articles of manufacture, including computerprogram products, are provided for auto-detection of favorable outliersand unfavorable outliers using unsupervised clustering.

In some embodiments, there is provided a method that includes receivinga plurality of objects; preprocessing the plurality of objects by atleast normalizing one or more terms of the plurality of objects;determining, for each of the plurality of objects, an aggregate valuebased on the one or more terms of the plurality of objects; identifying,based on unsupervised learning clustering, at least one of a favorableoutlier and an unfavorable outlier among the plurality of objects; inresponse to identifying an unfavorable outlier, removing the identifiedunfavorable outlier from the plurality of objects; and in response toremoving the identified unfavorable outlier, providing at least one ofthe remaining plurality of objects.

In some variations, one or more of the features disclosed hereinincluding the following features can optionally be included in anyfeasible combination. The unsupervised learning clustering may includeclustering based on an average gap value among aggregate values. Theunsupervised learning clustering may include sorting aggregate valuesgenerated for the plurality of objects and determining an average gapvalue among the aggregate values. The unsupervised learning clusteringmay include if a gap between a first aggregate value and a secondaggregate value is less than or equal to the average gap value, thefirst aggregate value is assigned to a first cluster; and if a gapbetween a first aggregate value and a second aggregate value is morethan the average gap value, the first aggregate value is assigned to asecond cluster. The preprocessing may further include identifying afirst term from the one or more terms as a maximization term; andnegating, before the determining of the aggregate value, the first term.The normalizing may include determining a z-score for the one or moreterms for each of the plurality of objects. The determining of theaggregate value may include determining a sum of the normalized one ormore terms for each of the plurality of objects. The providing at leastone of the remaining plurality of objects may include generating a userinterface including an indication of the at least one of the remainingplurality of objects including the favorable outlier; and causing thegenerated user interface to be presented at a client device. Theplurality of objects may include a plurality of bids.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive. Further features and/or variations may beprovided in addition to those set forth herein. For example, theimplementations described herein may be directed to various combinationsand subcombinations of the disclosed features and/or combinations andsubcombinations of several further features disclosed below in thedetailed description.

DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, show certain aspects of the subject matterdisclosed herein and, together with the description, help explain someof the principles associated with the disclosed implementations. In thedrawings,

FIG. 1A depicts an example of a system for detecting outliers, inaccordance with some example embodiments;

FIG. 1B plots clusters including a favorable outlier and an unfavorableoutlier, in accordance with some example embodiments;

FIG. 2A depicts another example of a system for detecting outliers, inaccordance with some example embodiments;

FIG. 2B depicts an example process for outlier detection, in accordancewith some example embodiments;

FIG. 3 depicts an example process for gap-based clustering withoutsupervision, in accordance with some example embodiments; and

FIG. 4 depicts a block diagram illustrating a computing system 400consistent with implementations of the current subject matter.

Like labels are used to refer to same or similar items in the drawings.

DETAILED DESCRIPTION

Detecting outliers is a challenging machine-learning problem. Toillustrate, the following example is provided. Pat is a senior categorybuyer at Acme Inc., and Pat is responsible for sourcing of all basechemicals used to make a product manufactured by Acme Inc. To that end,Pat may create, via a client device, a sourcing event that triggers at aperiodic interval, such as every quarter. This sourcing event checkswhether other suppliers are available for some if not all of the basechemicals in an effort to reduce the build of materials associated withthe base chemicals for the product. This sourcing event may include aplurality of items including terms defining the requirements for each ofthe base chemicals, and may include identifying a plurality of candidatesuppliers from a variety of locations. For example, the sourcing eventmay include a request for bids being sent electronically to each of theplurality of candidate suppliers each of which is associated with acorresponding client device. In response to the request for bids, Patmay receive electronically a plurality of responses in the form of abid, for example. Once the bidding process is over, Pat may apply anoptimizer to identify one or more “best” bids. This optimizer mayidentify the best bids based on one or more constraints. Theseconstraints may include values, such as price, quality, lead time (e.g.,time until delivery of product) and/or other factors, requirements, orconstraints (which may be pre-defined or defined, via a user interface,by Pat or one or more entities at Acme Inc., for example).

However, some, if not all, optimizers may require Pat to manually filterout outlier bids by manually defining one or more criteria to identifythe outlier bids, such that the outliers can be removed before theoptimizer selects the best bid(s). Moreover, the manual filtering may bedifficult given the large quantity of bids being processed and thedifferences in the values of the constraints. From an ERP planningperspective, the removal of outlier bids may be important as awarding abid to an outlier may represent awarding a bid to an ill-suitedsupplier.

In some embodiments, there is provided an outlier detection engine toidentify outliers. In some embodiments, the outlier detection engineuses an unsupervised learning clustering algorithm to identify outliersincluding favorable outliers and unfavorable outliers. In someimplementations, the outlier detection identifies one or more outliersbased on some, if not all, of the constraints, such as the numericalterms of a corresponding bid, to detect a potential outlier bid from thebid responses provided by, for example, the supplier.

Although some of the examples refer to outlier detection in the contextof bids, the outlier detection including the unsupervised learningdisclosed herein may be applied to other types of data as well. Forexample, an object, such as an electronic document or other type of datastructure, may include one or more constraints (e.g., requirements,values, attributes, etc.) that can be represented numerically as avector, an array, and/or other type of data format, such that theoutlier detection including the unsupervised learning disclosed hereinmay be used to detect outliers in these objects as well.

FIG. 1A depicts an example of a system 100 for detecting outliers inobjects, such as bids and/or the like. The system 100 may include one ormore client devices 110A-C coupled to a network, such as the Internet orany other type of communication mechanism. The client devices 110A-C mayeach be associated with, or located at, a provider (or generator) of theobject. For example, the client devices 110A-C may be associated with,or located at, a supplier providing the bid. The client devices maycomprise a computer, a smart phone, or other types of processor-baseddevices. In the example of FIG. 1A, the client 115 may be associatedwith, or located at, a receiver (or processor) of the objects, such asPat or Acme in the example above.

Referring to the previous Acme example for illustration, the client 115may trigger a sourcing event for a plurality of items, such aschemicals. Each item may have an associated set of terms, such a price,a quantity, a quality, etc., and these terms define the requirements(or, e.g., constraints) for each of the base chemicals. In the Acmeexample, the triggered sourcing event causes one or more messages to besent to clients 110A-C to request bids. In some implementations, the bidrequest messages sent to the clients 110A-C are sent by client 115 vianetwork 120. Alternatively, or additionally, the bid request messagessent to client 110A-C are sent by server 130 via network 120 (e.g., thesourcing event is stored at server 130 for client 115 and, whentriggered causes the bid request messages to be sent to the clients110A-C).

In response to the bid request messages being sent to (and received by)the clients 110A-C, the clients 110A-C may send via network 120responsive bids to the server 130. Alternatively, or additionally, thebids may be sent to the client 115, which in term provides the bids tothe server 130.

The server 130 including the outlier detector 140A may process the bidsto detect outliers. In some embodiments, the outlier detector 140Adetects at least one “favorable” outlier and at least one “unfavorableoutlier.” Next, the optimizer 140A may select the one or more “best”bids from the received bids. In some embodiments, before this selectionof the one or more “best” bids, the optimizer may remove one or more ofthe detected outliers. For example, the optimizer may remove one or moreunfavorable outliers, and then select the one or more “best” bids.

In some implementations, the server 130 may generate a user interfaceincluding the favorable outlier and/or the unfavorable outlier. And, theserver 130 may cause the generation of a user interface (which includesthe favorable outlier and/or the unfavorable outlier) to be presented atthe client 115. Alternatively, or additionally, the server may generatea user interface including the best bid(s), and the server 130 may causethe generated user interface to be presented at the client 115.

In some embodiments, the outlier detector 140A and/or optimizer 140B areprovided as a service, such as a SaaS on a cloud-based platformaccessible via network 120 to a plurality of clients. In someembodiments, the outlier detector 140A and optimizer 140B areincorporated into a single engine to identify optimum bids. As noted,although some of the examples refer to outlier detection in the contextof bids, the outlier detection may be used with other types of objects.

In some embodiments, the server 130 may receive a plurality of objects,such as the electronic bids (referred to herein as “bids”). When this isthe case, the server may preprocess each of the bids. For example, eachbid may include a plural of terms, such as price, units, unit ofmeasure, delivery dates, quality indication of the good or service,requirements, constraints, and/or other values. In some embodiments, thepreprocessing may include normalizing the terms to enable comparisons.For example, the value of a price term may be normalized (e.g.,standardized) to a predetermined range. To illustrate further, a priceterm value may be normalized so each of the price term values fall witha range of 100 to 500. Likewise, a lead time value may be normalized toa range of 5 to 20 days, and so forth. Likewise, units of measures andcurrency may also be normalized (e.g., converting pounds to grams,Dollars to Euros, etc.). The range for the normalization may bepredefined at the server 130 and/or selected via a user interface at aclient device.

The preprocessing may also classify (e.g., identify) one or more of theterms of a bid as a minimization term or a maximization term. A term maybe classified as a minimization term if, from the perspective of client115 (who is evaluating bid messages), the term should be minimized.Examples of minimization terms include price, days to delivery, riskfactor, and/or other terms that from the perspective of the client 115provide an optimum result when minimized. A term may be classified as amaximization term if, from the perspective of client 115 (who isevaluating bid messages), the term should be maximized. Examples ofmaximization terms include quality of goods and/or other terms that fromthe perspective of the client 115 provide an optimum result whenmaximized.

In some implementation, the normalization (also referred to asstandardization) may be performed using a statistical function, such asa z-score

$\left( {{e.g.},{z = \frac{x - \mu}{\sigma}}} \right),$

wherein x is the value being standardized, μ is mean, and σ is thestandard deviation of the samples. The normalization may thus allowprocessing terms that are on different, relative scales (e.g., priceswith a wide range so normalized to a predetermined range of, forexample, $1000 to $2000, lead time ranging in 5 to 10 days, and soforth).

In some embodiments, a term classified as a maximization term isnormalized by negating the value of the term. For example, if a qualityfactor term varies from 1 to 10 (where 10 represents the highest qualityof the good being supplied), the preprocessing may flag this qualityfactor term as a maximization term, such that when this term isnormalized, the term is also negated (e.g., −1 to −10). In this way, thehighest quality represents a minimum, such as “−10” in this example,along with the other terms, such as price and so forth being optimized.

After the pre-processing, each bid may be further processed. For eachsupplier (e.g., clients 110A-C) providing a bid, the outlier detector140A may preprocess each of the terms of a bid as noted above. Table 1below depicts an example of 10 bids from suppliers S1-S10, wherein eachbid includes 3 terms, such as price, lead time, and a quality factor,although other quantities of suppliers and types of terms may beimplemented as well.

To normalize the terms at Table 1, the terms may be preprocessed asfollows. For the price term which varies across suppliers from 62 to 126(with a mean value (μ) for price data of 91 and a standard deviation of15.48), the price 62 for S1 is normalized to −1.87 (e.g.,(62−91)/15.48=−1.87). Likewise, the price 78 for S2 is normalized to−0.84 (e.g., (78−91)/15.48=−0.84); and so forth as depicted at Table 2at the “Price_Standard” row. Table 2 depicts the price, lead time, andQuality Factor terms followed by the preprocessing that normalizes thosevalues. The respective normalized/standardized values are listed in“Price_Standard” row, “LeadTime_Standard” row, and “Quality_Standard”row.

TABLE 1 Terms S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Price 62 78 84 88 90 93 9496 99 126 Lead_Time 10 8 9 12 62 8 15 4 8 5 Quality_Factor 52 20 22 10 511 17 20 22 30

Although the example of Table 1 depicts 10 suppliers with 3 terms beingoptimized, this is an example for purposes of illustration. Indeed, theoutlier detector 140A may process hundreds of bids; each bid may includehundreds if not thousands of items; each item may include includeshundreds of terms (e.g., requirements). These large quantities makeoptimization based on the terms a computationally burdensome problem. Assuch, the processes disclosed herein may provide optimization in a morecomputationally efficient way while still maintaining the fidelity ofthe terms for each of the bids.

After all of the terms are normalized, the outlier detector 140A maythen determine, for each supplier, a score, such as an aggregate valueor other function indicative of the normalized term values of a givensupplier. Referring to Table 2 for example, the aggregate value (e.g.,the “Total_Weightage”) for each supplier is a sum of each of thestandardized/normalized values for a given supplier. For supplier S1 forexample, the aggregate, such as the Total_Weightage, is −4.63 (e.g.,−1.87+−0.25+−2.51=−4.63). Likewise, for supplier S2 for example, theaggregate, such as the Total_Weightage, is −1.15, and so forth throughthe suppliers. For each supplier, the Total_Weightage represents anormalized, weighted score across the terms (e.g., price, lead time, andquality factor).

TABLE 2 Terms S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Price 62.00 7.00 84.0088.00 90.00 93.00 94.00 96.00 99.00 126.00 Lead_Time 10.00 8.00 9.0012.00 62.00 8.00 15.00 4.00 8.00 5.00 Quality_Factor 52.00 20.00 22.0010.00 5.00 11.00 17.00 20.00 22.00 30.00 Price_Standard −1.87 −0.84−0.45 −0.19 −0.06 0.13 0.19 0.32 0.52 2.26 LeadTime_Standard −0.25 −0.38−0.31 −0.13 2.95 −0.38 0.06 −0.62 −0.38 −0.56 Quality_Standard −2.510.07 −0.09 0.88 1.28 0.80 0.31 0.07 −0.09 −0.73 Total_Weightage −4.63−1.15 −0.85 0.56 4.17 0.06 0.56 −0.23 0.05 0.97

In the example of Table 2, the Quality_Standard was classified and thusidentified as a maximization term. As such, the Quality_Standard valuesare negated (e.g., multiplied by minus 1 (“−1”)) as part of thepre-processing to yield the normalized/standardized values, such as−2.51, 0.07, −0.09, and so forth. In this way, a term that correspondsto a maximization term is negated and thus converted into a minimizationterm for purposed of optimization. In other words, all of the termsbeing optimized are normalized/standardize so that they are beingminimized for optimization. This negation also provides that afterclustering, data points that are on left most clusters will be potentialfavorable outliers and data points on right most clusters will be thepotential unfavorable ones (as explained further below with respect toFIG. 1B).

Although the example of Table 2 negated the maximization term, thepreprocessing may, alternatively, negate the minimization terms, whichin this example are Price_Standard and LeadTime_Standard values. Whenthis is the case, the minimization terms are normalized to maximizationterms by negating the minimization terms, so after clustering, datapoints on right most clusters will be the potential favorable outliersand data points on left most clusters will be the potential unfavorableones.

After the aggregate data is determined (e.g., Total_Weightage iscalculated at Table 2 for each bid from each supplier), the outlierdetector 140A may determine outliers, such as a favorable outlier and anunfavorable outlier. For example, the outlier detector 140A may identifythe outliers based on a clustering algorithm. In some embodiments, theclustering is performed based on an unsupervised learning clusteringalgorithm disclosed herein. This algorithm is unsupervised in the sensethat training data is not needed to train the outlier detector tocluster the data, such as the Total_Weightage data.

To illustrate, the outlier detector 140 may process the“Total_Weightage” values of Table 2 to identify outlier bids. In someembodiments, the identified outliers correspond to a favorable outlierand an unfavorable outlier. The favorable outlier represents a bid thatis favorable to the buyer, so the favorable bid, although an outlier,should not be removed or filtered. For example, a given supplier mayhave submitted a very low price compared to others, wherein this lowprice bid also has a high quality factor. In this example, the outlierdetector 140A should not identify and remove this outlier because it isa favorable outlier. Instead, the outlier detector 140A may generate anindication of the favorable outlier and/or cause the favorable outlierto be presented, via a user interface, to client 115. By contrast, anunfavorable outlier represents a bid that is unfavorable to the client115. For example, the bid may have a high price and include a lowquality score. In this unfavorable outlier case, the outlier detector140A detects the unfavorable outlier and automatically filters (e.g.,removes) it from further optimization processing.

In some example embodiments, the clustering may be performed based on anunsupervised learning clustering algorithm that uses gap analysis. Forexample, the outlier detector 140A may sort the aggregate data for eachbid, such as a sort of the Total_Weightage values in ascending order.Next, the outlier detector 140A may calculate an average gap for each ofthe aggregate data. For example, for the Total_Weightage of Table 2, theaverage gap may be determined as follows:

avg_gap=range of Total_Weightage values/quantity of suppliers.

The outlier detector 140A may also determine the individual gap betweeneach supplier's Total_Weightage values. The outlier detector 140 maysequentially compare each individual gap value with the average gap. Ifan individual gap value is less than the average gap, then the datapoints are in same cluster. If the individual gap value is greater thanthe average gap, a cluster may be considered “closed” and new cluster isformed using the current data sample. This process is continued throughall of the Total_Weightage values for all of the suppliers. At the endof gap/outlier processing, the outlier detector 140A forms at least onecluster, which can be used to identify favorable outliers, unfavorableoutliers, etc.

Table 3 depicts the Total_Weightage values of Table 2 sorted inascending order. The outlier detector 140A determines the average gap as0.88 (e.g., (4.17−(−4.63))/10=0.88).

TABLE 3 Terms S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Total_Weightage −4.63 −1.15−0.85 −0.23 0.05 0.55 0.56 0.56 0.97 4.17

The outlier detector 140A iterates over the sorted Total Weightagevalues. For example, the individual gap between S1 and S2 is 3.45 (e.g.,the absolute value of (−4.63−(−1.15)). For this first iteration, 3.45 isgreater than the average gap of 0.88, so the outlier detector places S1in a first cluster and forms a new cluster 2. Next, the outlier detectordetermines the individual gap between S2 and S3 as 0.30, which issmaller than or equal to average gap, so S2, S3 are associated withcluster 2. Likewise, the gap between S3 and S8 is 0.62, which is smallerthan or equal to average gap, so S2, S3, and S8 are in cluster 2. Next,the individual gap between S8 and S9 is 0.28, which is smaller than orequal to average gap so cluster 2 now includes S2, S3, S8, and S9. And,the gap between S9 and S6 is 0.50, which is smaller than or equal toaverage gap, so cluster 2 now includes S2, S3, S8, S9, and S6.Similarly, the gap between S6 and S4 is 0.01, which is smaller than orequal to average gap, so cluster 2 now includes S2, S3, S8, S9, S6, andS4. Next, the individual gap between S4 and S7 is 0.00, which is smallerthan or equal to average gap so cluster 2 now includes S2, S3, S8, S9,S6, S4, and S7. The outlier detector proceeds to determine theindividual gap between S7 and S10 as 0.41, which is smaller than orequal to average gap so cluster 2 includes S2, S3, S8, S9, S6, S4, S7,and S10. And, the gap between S10 and S5 is 3.2, which is greater thanthe average gap, so S5 is included in cluster 3. At the end of theiteration, the outlier detector forms 3 clusters as follows: cluster 1that includes the bid for S1 (a left most cluster or most favorableone); cluster 2 that includes bids from S2, S3, S8, S9, S6, S4, S7, andS10; and cluster 3 that includes a bid from S5 (a right most cluster orunfavorable one).

FIG. 1B depicts an example of the clustering results of Table 3.Referring to FIG. 1B, the clustering is plotted to show the thirdcluster 188A including the bid from supplier S5, which in this exampleis considered an unfavorable outlier. The plot also depicts the secondcluster 188B including the bids from S2, S3, S8, S9, S6, S4, S7, andS10. Lastly, the plot depicts the first cluster 188C including the bidfrom supplier S1, which in this example is considered a favorableoutlier. The server 130 may generate a user interface and cause thegenerated user interface to be presented at a client device, such asclient device 115. This generated user interface may depict one or moreof the clusters 188A-C to enable identification of the favorableoutlier, unfavorable outliers, and the like.

After the clustering, the outlier detector 140A selects which bids willbe filtered out (e.g., removed). In some implementations, a threshold isset that defines a percentage of data samples considered outliers. Thethreshold may be defined at the server 130 and/or selected via a userinterface presented at a client device. For example, the threshold maybe set at 10%, in which case 10% of the 10 bids for suppliers S1-S10 maybe identified as outliers. In this example, only one of the bids may bediscarded as an outlier. Moreover, as the outlier detector distinguishesbetween favorable and unfavorable outliers, only one of the unfavorableoutliers may be discarded in this example. Referring to the threeclusters, the bid associated with S5 in cluster 3 is removed. In someembodiments, the client 115 receives an indication via a user interfacethat S1 is the most favorable outlier. In some embodiments, theremaining bids in clusters 1 and 2 are provided to optimizer 140B forfurther optimization, the results of which are provided to client 115.Alternatively, or additionally, the optimizer 140B may select theoptimum bid, which in this example corresponds to the bid in cluster188C. If cluster 188C included a plurality of bids, the optimizer maygenerate a user interface for presentation at a client device, such thatthe user interface includes the bids in cluster 188C. Alternatively, oradditionally, if cluster 188C included a plurality of bids, theoptimizer may select the optimum bid among the bids in cluster 188C(which in this example would represent the bid with the lowestTotal_Weightage value (or leftmost bid).

FIG. 2A depicts another example of the server 130. In the example ofFIG. 2A, the server further includes an object receiver 298A, an objectpreprocessor 298B, and an aggregator 298C.

The object receiver 298A may be configured to receive one or moreobjects, such as bids from the clients 110A-C. For example, the objectreceiver may receive the object and parse the received object so thatthe item (e.g., data) of interest remains. In the example of the objectbeing a bid, the object receiver may parse out terms from the object,such that optimization and outlier detection is performed on the parsedterms. In the example of Tables 2 and 3, the values associated withPrice, Lead_Time, and Quality_Factor remain after parsing. The objectpreprocessor 298B may be configured to preprocess the received objectsby at least normalizing the received objects. In the case of bids, theobject preprocessor 298B may prepare the bids for outlier detector 140Aby normalizing the terms, such as the data included in the bids. Theaggregator 298C may be configured to determine an aggregate value, suchas scores or total weighted values, for each of the objects, such as thebids.

FIG. 2B depicts an example process for outlier detection, in accordancewith some example embodiments.

At 202, at least one object may be received. For example, the server 130(e.g., the outlier detector 140A and/or the object receiver 298A) mayreceive at least one object such as a bid, from at least one of theclients 110A-C. As noted above with respect to Table 1, the bids mayinclude data terms, such as values for price, lead time (e.g., time fromorder of item to delivery), quality factor (e.g., a measure of thequality or grade of the item), and/or the like. In some embodiments, theobject, such as the bid, is parsed such that the items of interest(e.g., numerical data associated with price, lead time, quality factor,and the like) remain.

At 204, the at least one object may be preprocessed. For example, theserver 130 (e.g., the outlier detector 140A and/or object preprocessor298B) may preprocess the objects such as the bids by normalizing thedata associated with the bids. For example, the normalization mayinclude normalizing, for each bid, one or more terms, such as the priceterm value, lead time value, quality, and/or the like. An example of thenormalization is depicted above with respect to Table 2 atPrice_Standard, Quality_Standard, and LeadTime_Standard. Moreover, thepreprocessing may include negating the value of a term that isclassified as a maximization term.

At 206, an aggregate value, such as the Total_Weightage may bedetermined. For example, the server 130 (e.g., the outlier detector 140Aand/or object preprocessor aggregator 298C) may calculate theTotal_Weightage as described above with respect to Table 2. For eachsupplier, the Total_Weightage represents a normalized, weighted scoreacross the terms (e.g., price, lead time, and quality factor).

At 208, the server 130 (e.g., outlier detector 140A) may determine,based on the aggregate data, outliers including favorable andunfavorable outliers. For example, the outlier detector 140A may includea clustering algorithm to identify outliers, which may include one ormore unfavorable outliers and one or more favorable outliers. In someembodiments, an unsupervised learning clustering algorithm may be usedfor clustering. In some embodiments, the unsupervised learningclustering algorithm may include a gap analysis for the clustering.

In response to the presence or detection of an unfavorable outlier, theunfavorable outlier may be removed, at 210, and the remaining data forthe objects, such as the bids, may be provided to a user interface(e.g., at client 115) and/or an optimizer 140B for further optimizationand ultimately selection of an object such as a bid. For example, theserver 130 may generate a user interface and cause the generated userinterface to be presented at a client device, such as client device 115.This generated user interface may depict indicate the object having thehighest aggregate value (e.g., Total_Weightage) and, as such, theoptimum object, such as the optimum bid. In some instances, this optimumbid may correspond to a favorable outlier.

FIG. 3 depicts an example process for gap-based clustering withoutsupervision, in accordance with some example embodiments.

At 302, the aggregate values, such as the Total_Weightage, may besorted. For example, the aggregate values may be sorted in ascendingorder as depicted at Table 3 above.

At 304, an average gap value may be determined among the aggregatevalues. Referring again Table 3, the outlier detector 140A determinesthe average gap as 0.88 (e.g., (4.17−(−4.63))/10=0.88), for example.

At 306, if a gap between a first aggregate value and a second aggregatevalue is more than the average gap value, the first aggregate value isplace in a second cluster. Referring to the example above where the gapbetween S10 and S5 is 3.2, which is greater than the average gap. Thebid for S5 is included in cluster 3. At 308, if a gap between a firstaggregate value and a second aggregate value is less than or equal tothe average gap value, the first aggregate value is place in a firstcluster. Referring to the example above, if the outlier detectordetermines the individual gap between S2 and S3 as 0.30, which issmaller than or equal to average gap, so S2, S3 are associated withcluster 2. The gap processing may proceed through the sorted aggregatevalues until some, if not all, of the aggregate values are placed in acluster. FIG. 1B depicts an example of the clusters 188A-C formed basedon the unsupervised learning clustering algorithm disclosed herein.

In some implementations, there is provided auto detection of outliers ofobjects, such as bids. The outlier detection may consider some, if notall the item terms, of the object using an efficient, unsupervisedlearning clustering algorithm. In some implementations, favorableoutliers and unfavorable outliers are distinguished and identified. Insome implementations, the best (e.g., optimum) bid among all the bids isidentified taking in to account the bidding terms for a line item and/orafter removal of certain outliers. After detecting bids as favorable orunfavorable outliers, a recommendation may be provided to a clientdevice to indicate which bidding term most affected the bid beingselected as a favorable outlier or an unfavorable outlier.

FIG. 4 depicts a block diagram illustrating a computing system 400consistent with implementations of the current subject matter. Forexample, the system 400 can be used to implement the client devices, theserver, and/or the like.

As shown in FIG. 4, the computing system 400 can include a processor410, a memory 420, a storage device 430, and input/output devices 440.The computing system 400 may be used at the clients or the server. Forexample, the server 130 may execute the outlier detector 140A and theoptimizer on one or more computing systems 400.

The processor 410, the memory 420, the storage device 430, and theinput/output devices 440 can be interconnected via a system bus 450. Theprocessor 410 is capable of processing instructions for execution withinthe computing system 400. Such executed instructions can implement oneor more components of, for example, the trusted server, client devices(parties), and/or the like. In some implementations of the currentsubject matter, the processor 410 can be a single-threaded processor.Alternately, the processor 410 can be a multi-threaded processor. Theprocessor may be a multi-core processor having a plurality or processorsor a single core processor. The processor 410 is capable of processinginstructions stored in the memory 420 and/or on the storage device 430to display graphical information for a user interface provided via theinput/output device 440.

The memory 420 is a computer readable medium such as volatile ornon-volatile that stores information within the computing system 400.The memory 420 can store data structures representing configurationobject databases, for example. The storage device 430 is capable ofproviding persistent storage for the computing system 400. The storagedevice 430 can be a floppy disk device, a hard disk device, an opticaldisk device, or a tape device, or other suitable persistent storagemeans. The input/output device 440 provides input/output operations forthe computing system 400. In some implementations of the current subjectmatter, the input/output device 440 includes a keyboard and/or pointingdevice. In various implementations, the input/output device 440 includesa display unit for displaying graphical user interfaces.

According to some implementations of the current subject matter, theinput/output device 440 can provide input/output operations for anetwork device. For example, the input/output device 440 can includeEthernet ports or other networking ports to communicate with one or morewired and/or wireless networks (e.g., a local area network (LAN), a widearea network (WAN), the Internet).

In some implementations of the current subject matter, the computingsystem 400 can be used to execute various interactive computer softwareapplications that can be used for organization, analysis and/or storageof data in various (e.g., tabular) format (e.g., Microsoft Excel®,and/or any other type of software). Alternatively, the computing system400 can be used to execute any type of software applications. Theseapplications can be used to perform various functionalities, e.g.,planning functionalities (e.g., generating, managing, editing ofspreadsheet documents, word processing documents, and/or any otherobjects, etc.), computing functionalities, communicationsfunctionalities, etc. The applications can include various add-infunctionalities (e.g., SAP Integrated Business Planning add-in forMicrosoft Excel as part of the SAP Business Suite, as provided by SAPSE, Walldorf, Germany) or can be standalone computing products and/orfunctionalities. Upon activation within the applications, thefunctionalities can be used to generate the user interface provided viathe input/output device 440. The user interface can be generated andpresented to a user by the computing system 400 (e.g., on a computerscreen monitor, etc.).

One or more aspects or features of the subject matter described hereincan be realized in digital electronic circuitry, integrated circuitry,specially designed ASICs, field programmable gate arrays (FPGAs)computer hardware, firmware, software, and/or combinations thereof.These various aspects or features can include implementation in one ormore computer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichcan be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device. Theprogrammable system or computing system may include clients and servers.A client and server are generally remote from each other and typicallyinteract through a communication network. The relationship of client andserver arises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

These computer programs, which can also be referred to as programs,software, software applications, applications, components, or code,include machine instructions for a programmable processor, and can beimplemented in a high-level procedural and/or object-orientedprogramming language, and/or in assembly/machine language. As usedherein, the term “machine-readable medium” refers to any computerprogram product, apparatus and/or device, such as for example magneticdiscs, optical disks, memory, and Programmable Logic Devices (PLDs),used to provide machine instructions and/or data to a programmableprocessor, including a machine-readable medium that receives machineinstructions as a machine-readable signal. The term “machine-readablesignal” refers to any signal used to provide machine instructions and/ordata to a programmable processor. The machine-readable medium can storesuch machine instructions non-transitorily, such as for example as woulda non-transient solid-state memory or a magnetic hard drive or anyequivalent storage medium. The machine-readable medium can alternativelyor additionally store such machine instructions in a transient manner,such as for example, as would a processor cache or other random accessmemory associated with one or more physical processor cores.

To provide for interaction with a user, one or more aspects or featuresof the subject matter described herein can be implemented on a computerhaving a display device, such as for example a cathode ray tube (CRT) ora liquid crystal display (LCD) or a light emitting diode (LED) monitorfor displaying information to the user and a keyboard and a pointingdevice, such as for example a mouse or a trackball, by which the usermay provide input to the computer. Other kinds of devices can be used toprovide for interaction with a user as well. For example, feedbackprovided to the user can be any form of sensory feedback, such as forexample visual feedback, auditory feedback, or tactile feedback; andinput from the user may be received in any form, including acoustic,speech, or tactile input. Other possible input devices include touchscreens or other touch-sensitive devices such as single or multi-pointresistive or capacitive track pads, voice recognition hardware andsoftware, optical scanners, optical pointers, digital image capturedevices and associated interpretation software, and the like.

In the descriptions above and in the claims, phrases such as “at leastone of” or “one or more of” may occur followed by a conjunctive list ofelements or features. The term “and/or” may also occur in a list of twoor more elements or features. Unless otherwise implicitly or explicitlycontradicted by the context in which it is used, such a phrase isintended to mean any of the listed elements or features individually orany of the recited elements or features in combination with any of theother recited elements or features. For example, the phrases “at leastone of A and B;” “one or more of A and B;” and “A and/or B” are eachintended to mean “A alone, B alone, or A and B together.” A similarinterpretation is also intended for lists including three or more items.For example, the phrases “at least one of A, B, and C;” “one or more ofA, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, Balone, C alone, A and B together, A and C together, B and C together, orA and B and C together.” Use of the term “based on,” above and in theclaims is intended to mean, “based at least in part on,” such that anunrecited feature or element is also permissible.

The subject matter described herein can be embodied in systems,apparatus, methods, and/or articles depending on the desiredconfiguration. The implementations set forth in the foregoingdescription do not represent all implementations consistent with thesubject matter described herein. Instead, they are merely some examplesconsistent with aspects related to the described subject matter.Although a few variations have been described in detail above, othermodifications or additions are possible. In particular, further featuresand/or variations can be provided in addition to those set forth herein.For example, the implementations described above can be directed tovarious combinations and subcombinations of the disclosed featuresand/or combinations and subcombinations of several further featuresdisclosed above. In addition, the logic flows depicted in theaccompanying figures and/or described herein do not necessarily requirethe particular order shown, or sequential order, to achieve desirableresults. For example, the logic flows may include different and/oradditional operations than shown without departing from the scope of thepresent disclosure. One or more operations of the logic flows may berepeated and/or omitted without departing from the scope of the presentdisclosure. Other implementations may be within the scope of thefollowing claims.

What is claimed is:
 1. A system, comprising: at least one dataprocessor; and at least one memory storing instructions which, whenexecuted by the at least one data processor, result in operationscomprising: receiving a plurality of objects; preprocessing theplurality of objects by at least normalizing one or more terms of theplurality of objects; determining, for each of the plurality of objects,an aggregate value based on the one or more terms of the plurality ofobjects; identifying, based on unsupervised learning clustering, atleast one of a favorable outlier and an unfavorable outlier among theplurality of objects; in response to identifying an unfavorable outlier,removing the identified unfavorable outlier from the plurality ofobjects; and in response to removing the identified unfavorable outlier,providing at least one of the remaining plurality of objects.
 2. Thesystem of claim 1, wherein the unsupervised learning clusteringcomprises clustering based on an average gap value among aggregatevalues.
 3. The system of claim 1, wherein the unsupervised learningclustering comprises: sorting aggregate values generated for theplurality of objects; and determining an average gap value among theaggregate values.
 4. The system of claim 3, wherein the unsupervisedlearning clustering further comprises: if a gap between a firstaggregate value and a second aggregate value is less than or equal tothe average gap value, the first aggregate value is assigned to a firstcluster; and if a gap between a first aggregate value and a secondaggregate value is more than the average gap value, the first aggregatevalue is assigned to a second cluster.
 5. The system of claim 1, whereinthe preprocessing further comprises: identifying a first term from theone or more terms as a maximization term; and negating, before thedetermining of the aggregate value, the first term.
 6. The system ofclaim 1, wherein the normalizing includes determining a z-score for theone or more terms for each of the plurality of objects.
 7. The system ofclaim 1, wherein the determining of the aggregate value comprisesdetermining a sum of the normalized one or more terms for each of theplurality of objects.
 8. The system of claim 1, wherein the providing atleast one of the remaining plurality of objects comprises: generating auser interface including an indication of the at least one of theremaining plurality of objects including the favorable outlier; andcausing the generated user interface to be presented at a client device.9. The system of claim 1, wherein plurality of objects comprise aplurality of bids.
 10. A method comprising: receiving a plurality ofobjects; preprocessing the plurality of objects by at least normalizingone or more terms of the plurality of objects; determining, for each ofthe plurality of objects, an aggregate value based on the one or moreterms of the plurality of objects; identifying, based on unsupervisedlearning clustering, at least one of a favorable outlier and anunfavorable outlier among the plurality of objects; in response toidentifying an unfavorable outlier, removing the identified unfavorableoutlier from the plurality of objects; and in response to removing theidentified unfavorable outlier, providing at least one of the remainingplurality of objects.
 11. The method of claim 10, wherein theunsupervised learning clustering comprises clustering based on anaverage gap value among aggregate values.
 12. The method of claim 10,wherein the unsupervised learning clustering comprises: sortingaggregate values generated for the plurality of objects; and determiningan average gap value among the aggregate values.
 13. The method of claim12, wherein the unsupervised learning clustering further comprises: if agap between a first aggregate value and a second aggregate value is lessthan or equal to the average gap value, the first aggregate value isassigned to a first cluster; and if a gap between a first aggregatevalue and a second aggregate value is more than the average gap value,the first aggregate value is assigned to a second cluster.
 14. Themethod of claim 10, wherein the preprocessing further comprises:identifying a first term from the one or more terms as a maximizationterm; and negating, before the determining of the aggregate value, thefirst term.
 15. The method of claim 10, wherein the normalizing includesdetermining a z-score for the one or more terms for each of theplurality of objects.
 16. The method of claim 10, wherein thedetermining of the aggregate value comprises determining a sum of thenormalized one or more terms for each of the plurality of objects. 17.The method of claim 10, wherein the providing at least one of theremaining plurality of objects comprises: generating a user interfaceincluding an indication of the at least one of the remaining pluralityof objects including the favorable outlier; and causing the generateduser interface to be presented at a client device.
 18. The method ofclaim 10, wherein plurality of objects comprise a plurality of bids. 19.A non-transitory computer-readable storage medium including instructionswhich, when executed by at least one data processor, causes operationscomprising: receiving a plurality of objects; preprocessing theplurality of objects by at least normalizing one or more terms of theplurality of objects; determining, for each of the plurality of objects,an aggregate value based on the one or more terms of the plurality ofobjects; identifying, based on unsupervised learning clustering, atleast one of a favorable outlier and an unfavorable outlier among theplurality of objects; in response to identifying an unfavorable outlier,removing the identified unfavorable outlier from the plurality ofobjects; and in response to removing the identified unfavorable outlier,providing at least one of the remaining plurality of objects.